Efficient decoding strategies for conversational speech recognition using a constrained nonlinear state-space model
نویسندگان
چکیده
In this paper, we present two efficient strategies for likelihood computation and decoding in a continuous speech recognizer using an underlying nonlinear state-space dynamic model for the hidden speech dynamics. The state-space model has been specially constructed so as to be suitable for the conversational or casual style of speech where phonetic reduction abounds. Two specific decoding algorithms, based on optimal state-sequence estimation for the nonlinear state-space model, are derived, implemented, and evaluated. They successfully overcome the exponential growth in the original search paths by using the path-merging approaches derived from Bayes’ rule. We have tested and compared the two algorithms using the speech data from the Switchboard corpus, confirming their effectiveness. Conversational speech recognition experiments using the Switchboard corpus further demonstrated that the use of the new decoding strategies is capable of reducing the recognizer’s word error rate compared with two baseline recognizers, including the HMM system and the nonlinear state-space model using the HMM-produced phonetic boundaries, under identical test conditions.
منابع مشابه
Efficient decoding strategy for conversational speech recognition using state-space models for vocal-tract-resonance dynamics
In this paper, we present an efficient strategy for likelihood computation and decoding in a continuous speech recognizer using underlying state-space dynamic models for the hidden speech dynamics. The state-space models have been constructed in a special way so as to be suitable for the conversational or casual style of speech where phonetic reduction abounds. The interacting multiple model (I...
متن کاملUsing Continuous Space Language Models for Conversational Speech Recognition
Language modeling for conversational speech suffers from the limited amount of available adequate training data. This paper describes a new approach that performs the estimation of the language model probabilities in a continuous space, allowing by these means smooth interpolation of unobserved n-grams. This continuous space language model is used during the last decoding pass of a state-of-the...
متن کاملConversational telephone speech recognition
This paper describes the development of a speech recognition system for the processing of telephone conversations, starting with a state-of-the-art broadcast news transcription system. We identify major changes and improvements in acoustic and language modeling, as well as decoding, which are required to achieve state-of-theart performance on conversational speech. Some major changes on the aco...
متن کاملNew pruning criteria for efficient decoding
In large vocabulary continuous speech recognizers the search space needs to be constrained efficiently to make the recognition task feasible. Beam pruning and restricting the number of active paths are the most widely applied techniques for this. In this paper, we present three additional pruning criteria, which can be used to further limit the search space. These new criteria take into account...
متن کاملAttention shift decoding for conversational speech recognition
We introduce a novel approach to decoding in speech recognition (termed attention-shift decoding) that attempts to mimic aspects of human speech recognition responsible for robustness in processing conversational speech. Our approach is a radical departure from traditional decoding algorithms for speech recognition. We propose a method to first identify reliable regions of the speech signal and...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEEE Trans. Speech and Audio Processing
دوره 11 شماره
صفحات -
تاریخ انتشار 2003